Add Mixtral #2196

kanpuriyanawab · 2025-04-02T18:52:52Z

This PR adds Mixtral to Keras Hub.

Reference

mixtral output matching

kanpuriyanawab · 2025-04-14T15:50:53Z

Output matching :

divyashreepathihalli

Left a few comments! Please provide a demo colab

divyashreepathihalli · 2025-04-15T11:21:49Z

keras_hub/src/models/mixtral/mixtral_attention.py

+        )
+        self._query_dense.build(inputs_shape)
+
+        self._key_dense = keras.layers.EinsumDense(


update the layer names to be compatible with enable_lora

divyashreepathihalli · 2025-04-15T11:28:10Z

keras_hub/src/models/mixtral/mixtral_backbone.py

+@keras_hub_export("keras_hub.models.MixtralBackbone")
+class MixtralBackbone(Backbone):
+    """
+    The Mixtral Transformer core architecture with hyperparameters.


docstring first line should follow """

This still needs to be changed to --> """The Mixtral Transformer core architecture with hyperparameters.

keras_hub/src/models/mixtral/mixtral_backbone.py

divyashreepathihalli · 2025-04-15T11:31:13Z

keras_hub/src/models/mixtral/mixtral_causal_lm_preprocessor.py

+    preprocessor("League of legends")
+
+    # Tokenize a batch of sentences.
+    sentences = tf.constant(["Taco tuesday", "Fish taco please!"])


divyashreepathihalli · 2025-04-15T11:31:40Z

keras_hub/src/models/mixtral/mixtral_causal_lm.py

+        target_ids = keras.ops.roll(generation_ids, shift=-1, axis=1)
+
+        embeddings = None
+        with tf.GradientTape(watch_accessed_variables=True) as tape:


borrowed docstring

We don't recommend using backend specific examples, For generic usage use keras.ops or numpy

There are some conflicts in the api directory due to the recent changes, please resolve.

conflicts resolved.

We don't recommend using backend specific examples, For generic usage use keras.ops or numpy

@sachinprasadhs like I mentioned above, there is already tf.GradientTape examples in existing model docstrings, that should be cleaned up in a separate PR.

lets not pile on the mess in new PRs. Lets keep it clean.

kanpuriyanawab · 2025-04-28T05:41:27Z

mixtral output matching

sachinprasadhs

Added few more comments.

keras_hub/src/models/mixtral/mixtral_backbone.py

keras_hub/src/models/mixtral/mixtral_decoder.py

sachinprasadhs · 2025-04-29T21:02:57Z

keras_hub/src/models/mixtral/mixtral_layer_norm.py

+from keras import ops
+
+
+# TODO: Deprecate this in favor of


We don't support Keras 2 anymore in Keras Hub, I guess you can get rid of this

forgot to remove this comment, no, keras layernorm doesn't produce same results as this custom layernorm.

sachinprasadhs · 2025-04-29T21:52:39Z

keras_hub/src/models/mixtral/mixtral_decoder.py

+            # Below is a workaround for `ops.triu` for Keras 2.
+            # TODO(tirthasheshpatel): Use `ops.triu` once Keras 2 support is
+            # removed.
+            # causal_mask = ops.triu(causal_mask, k=-self.sliding_window)


Keras 2 support is removed now, you can enable this

ops.trui/tril has issues with dynamic shape on the tensorflow,
(refer keras_hub/src/models/gemma/gemma_attention.py/_mask_sliding_window),
hence I chose to keep this as it is.

updated comment tho!

Okay, can you remove the line "# Below is a workaround for ops.triu for Keras 2."

keras_hub/src/models/mixtral/mixtral_causal_lm_preprocessor_test.py

sachinprasadhs

Thanks, Left some small changes comments.

sachinprasadhs · 2025-05-02T19:05:02Z

keras_hub/src/models/mixtral/mixtral_tokenizer.py

+    `tf.RaggedTensor` where the last dimension of the output is ragged.
+
+    If input is a scalar string (rank == 0), the layer will output a dense
+    `tf.Tensor` with static shape `[None]`.


This needs to be corrected, since this is not specific to TF backend

sachinprasadhs · 2025-05-02T19:11:34Z

keras_hub/src/models/mixtral/mixtral_decoder.py

+            # Below is a workaround for `ops.triu` for Keras 2.
+            # TODO(tirthasheshpatel): Use `ops.triu` once Keras 2 support is
+            # removed.
+            # causal_mask = ops.triu(causal_mask, k=-self.sliding_window)


Okay, can you remove the line "# Below is a workaround for ops.triu for Keras 2."

sachinprasadhs · 2025-05-02T19:15:24Z

keras_hub/src/models/mixtral/mixtral_backbone_test.py

+            init_kwargs=self.init_kwargs,
+            input_data=self.input_data,
+            expected_output_shape=(2, 5, 16),
+            run_quantization_check=False,


Can you enable this test.

sachinprasadhs · 2025-05-02T19:16:15Z

keras_hub/src/models/mixtral/mixtral_backbone.py

+@keras_hub_export("keras_hub.models.MixtralBackbone")
+class MixtralBackbone(Backbone):
+    """
+    The Mixtral Transformer core architecture with hyperparameters.


This still needs to be changed to --> """The Mixtral Transformer core architecture with hyperparameters.

divyashreepathihalli

what about the aux_loss implementation for Mixtral?

kanpuriyanawab added 6 commits April 2, 2025 17:35

mistral init commit

90e7c59

wip mixtral

43764fc

mixtral wip

b509c48

checkpoint conversion wip

b0160cb

mixtral weight matching complete

b9bc2e3

batched moe impl

d5aee61

kanpuriyanawab marked this pull request as ready for review April 10, 2025 08:40

kanpuriyanawab and others added 5 commits April 11, 2025 16:11

output matching with batched moe complete

3597d53

update

0c73de0

Merge branch 'keras-team:master' into mixtral

e554b13

flash attention fixes

ff5f4b1

bug fixes

1dba1a3

kanpuriyanawab requested a review from divyashreepathihalli April 14, 2025 05:41

kanpuriyanawab self-assigned this Apr 14, 2025

bug fix

71d7401

divyashreepathihalli reviewed Apr 15, 2025

View reviewed changes

sachinprasadhs added the stat:awaiting response from contributor label Apr 25, 2025

Merge branch 'master' into mixtral

08db45e

sachinprasadhs reviewed Apr 29, 2025

View reviewed changes

keras_hub/src/models/mixtral/mixtral_causal_lm_preprocessor_test.py Show resolved Hide resolved

kanpuriyanawab and others added 5 commits May 2, 2025 11:27

address comments

ca0a5ff

api gen

7b6f155

Merge branch 'master' into mixtral

92c55ee

update

43f348d

update

1bc598f

sachinprasadhs reviewed May 2, 2025

View reviewed changes

divyashreepathihalli reviewed May 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Mixtral #2196

Add Mixtral #2196

kanpuriyanawab commented Apr 2, 2025 •

edited by divyashreepathihalli

Loading

kanpuriyanawab commented Apr 14, 2025

divyashreepathihalli left a comment

divyashreepathihalli Apr 15, 2025

divyashreepathihalli Apr 15, 2025

sachinprasadhs May 2, 2025

divyashreepathihalli Apr 15, 2025

divyashreepathihalli Apr 15, 2025

kanpuriyanawab Apr 16, 2025

sachinprasadhs Apr 25, 2025

sachinprasadhs Apr 25, 2025

kanpuriyanawab Apr 28, 2025

kanpuriyanawab Apr 28, 2025

divyashreepathihalli May 2, 2025

kanpuriyanawab commented Apr 28, 2025

sachinprasadhs left a comment

sachinprasadhs Apr 29, 2025

kanpuriyanawab May 2, 2025

sachinprasadhs Apr 29, 2025

kanpuriyanawab May 2, 2025

kanpuriyanawab May 2, 2025

sachinprasadhs May 2, 2025

sachinprasadhs left a comment

sachinprasadhs May 2, 2025

sachinprasadhs May 2, 2025

sachinprasadhs May 2, 2025

sachinprasadhs May 2, 2025

divyashreepathihalli left a comment

Add Mixtral #2196

Are you sure you want to change the base?

Add Mixtral #2196

Conversation

kanpuriyanawab commented Apr 2, 2025 • edited by divyashreepathihalli Loading

kanpuriyanawab commented Apr 14, 2025

divyashreepathihalli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kanpuriyanawab commented Apr 28, 2025

sachinprasadhs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sachinprasadhs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

divyashreepathihalli left a comment

Choose a reason for hiding this comment

kanpuriyanawab commented Apr 2, 2025 •

edited by divyashreepathihalli

Loading